Predicting Student Admission to University

Python Machine Learning Classification Logistic Regression Streamlit

Project Overview

This project predicts the likelihood of a student being admitted to a university based on key academic and personal attributes. The model assists applicants in assessing their admission chances and provides universities with a data-driven approach to evaluating candidates.

Key Insights

GRE, TOEFL, CGPA, and research experience strongly influence admission probability.
University rating, SOP, and LOR have a moderate impact on admission chances.
Logistic Regression was chosen for its efficiency, interpretability, and suitability for binary classification.
Feature scaling (StandardScaler) improved model performance and ensured consistency.
Achieved 94% accuracy, with an F1-score of 91%, ensuring a balance between precision and recall.
ROC AUC score of 0.94, demonstrating strong model performance in distinguishing admission outcomes.
Deployed an interactive Streamlit web app for real-time user predictions.

Technical Implementation

Data Preprocessing:
- Checked for missing values and handled inconsistencies.
- Scaled numerical features using StandardScaler for better model performance.
Model Selection:
- Implemented Logistic Regression, a reliable algorithm for binary classification.
- Used GridSearchCV for hyperparameter tuning to optimize performance.
Model Evaluation:
- Measured accuracy, precision, recall, and F1-score for performance assessment.
- Generated a confusion matrix and ROC curve to evaluate classification performance.
Deployment:
- Developed a Streamlit web app for real-time predictions.
- Packaged the model using Pickle for efficient loading and inference.

Live Preview

Loading preview...

Video Preview

Key Learnings

Feature engineering is crucial: Selecting and scaling the right features significantly improves model performance.
Logistic Regression works well for binary classification with interpretable results.
Model evaluation should go beyond accuracy: Precision, recall, and F1-score provide deeper insights into performance.
Pipeline automation enhances efficiency and makes the workflow more reproducible.
Deploying ML models using Streamlit allows users to interact with predictions easily.
Real-world applications require continuous improvement, such as expanding the dataset or testing other models for further enhancement.

View Live GitHub